EN FR
EN FR


Section: New Results

Formal verification of compilers and static analyses

The Compcert verified compiler for the C language

Participants : Xavier Leroy, Sandrine Blazy [project-team Celtique] , Alexandre Pilkiewicz.

In the context of our work on compiler verification (see section  3.3.1 ), since 2005 we have been developing and formally verifying a moderately-optimizing compiler for a large subset of the C programming language, generating assembly code for the PowerPC, ARM, and x86 architectures [5] . This compiler comprises a back-end part, translating the Cminor intermediate language to assembly and reusable for source languages other than C [4] , and a front-end translating the CompCert C subset of C to Cminor. The compiler is mostly written within the specification language of the Coq proof assistant, from which Coq's extraction facility generates executable Caml code. The compiler comes with a 50000-line, machine-checked Coq proof of semantic preservation establishing that the generated assembly code executes exactly as prescribed by the semantics of the source C program.

This year, we improved the Compcert C compiler in several ways:

  • The formal semantics for the CompCert C source language was made executable and turned into a reference interpreter. This interpreter is proved sound and complete with respect to the formal semantics. It makes it possible to animate the semantics on test programs, identifying undefined behaviors and enumerating all possible execution orders. Another application is to provide an experimental validation of the semantics itself.

  • The top-level statements of compiler correctness were strengthened. In particular, semantic preservation is shown to hold even in the presence of a non-deterministic execution context. Also, we showed that if the source program goes wrong after performing some input/output actions, the compiled code performs at least these actions before continuing with an arbitrary behavior.

  • A new optimization pass, redundant reload optimization, was added, improving performance by up to 10% on the x86 architecture.

  • A general annotation mechanism was added to observe the values of local program variables at user-specified program points, such observations being guaranteed to produce the same results in the source code and in the compiled code. These annotations can be used to improve the precision of worst-case execution time (WCET) analysis over the compiled code. They can also provide stronger evidence of traceability for code qualification purposes.

Three versions of the CompCert development were publically released, integrating these improvements: versions 1.8.1 in March, 1.8.2 in April, and 1.9 in August.

In parallel, we continued our collaboration with Jean Souyris, Ricardo Bedin França and Denis Favre-Felix at Airbus. They are conducting an experimental evaluation of CompCert's usability for avionics software, and studying the regulatory issues (DO-178 certification) surrounding the potential use of CompCert in this context. Preliminary results were reported at the Predictability and Performance in Embedded Systems workshop [20] . More detailed results will be presented at the 2012 Embedded Real-Time Software and Systems conference (ERTS'12) [19] .

Formal specification and verified compilation of C++

Participants : Tahina Ramananandro, Gabriel Dos Reis [Texas A&M University] , Xavier Leroy.

This year, under Xavier Leroy's supervision and with precious C++ advice from Gabriel Dos Reis, Tahina Ramananandro tackled the issue of formally specifying object construction and destruction in multiple-inheritance languages, especially the C++ flavour featuring non-virtual and virtual inheritance (allowing repeated and shared base class subobjects), and also structure array fields. This formalization consists in specifying, in Coq, a small-step operational semantics for a subset of C++ featuring multiple inheritance, static and dynamic casts, field accesses, and object construction and destruction, and mechanically proving properties about resource management, thus obtaining a formal account of the RAII (Resource Acquisition is Initialization) principle. Moreover, this formalization also studies the impact of object construction and destruction on the behaviour of dynamic operations such as virtual function dispatch, introducing the notion of generalized dynamic type. These results were accepted for publication at the POPL 2012 symposium [29] .

Finally, this formalization includes a verified realistic compiler for this subset of C++ to a CFG-style 3-address intermediate language featuring low-level memory accesses in the style of the CompCert RTL language. Following usual compilation schemes and techniques inspired from the Common Vendor ABI for Itanium (which has since been reused and adapted by GNU GCC), the target language additionally features virtual tables to model object-oriented features, and virtual table tables to model the generalized dynamic type changes during object construction and destruction. This verified compiler reuses and extends the results of a previous work on verified C++ object layout by Tahina Ramananandro, Gabriel Dos Reis and Xavier Leroy published this year at the POPL 2011 symposium [28] .

Validation of polyhedral optimizations

Participants : Alexandre Pilkiewicz, François Pottier.

The polyhedral representation of loop nests with affine bounds is a unified way to compute and represent a large set of optimizations, including loop fusion, skewing, splitting, peeling, tilling etc. Polyhedral optimizers usually rely on heavily optimized C tools and libraries to manipulate polyhedrons. Those C libraries are, like any other programs, bug prone, which can easily lead to erroneous optimizations.

Those two facts—powerful yet error prone—make the formal proof of such optimizations appealing. Proving a full optimizer however would probably be unrealistic: the proof would be terribly challenging, but even writing in Coq an optimizer efficient enough to handle non trivial loop nest might be impossible.

Another option is to write and prove in Coq a validator: after each run of the unproved optimizer—considered as a black box—the validator is used to compare the program before and after optimization to make sure that its semantics—the meaning of the program—has not been change. If the validator does not report an error, we have formal certitude that no bug has been introduced by the optimization.

Alexandre Pilkiewicz, under François Pottier's supervision, has implemented and proved in Coq such a validator.

A formally-verified parser for CompCert

Participants : Jacques-Henri Jourdan, François Pottier, Xavier Leroy.

During a 6-month Master's internship (M2), Jacques-Henri Jourdan built a formally-verified parser for the C99 language. This parser was obtained through a general method for checking that an LR(1) parser produced by the parser generator Menhir is correct and complete, that is, it conforms exactly to the specification represented by the context-free grammar. This check is carried out by a validator that is implemented in Coq and proved correct, so that, in the end, there is no need to trust Menhir. A paper describing this work was accepted for presentation at the ESOP 2012 conference [24] .

Formal verification of an alias analysis

Participants : Valentin Robert, Xavier Leroy.

As part of his 5-month Master's internship, Valentin Robert developed and proved correct a static analysis for pointers and non-aliasing. This alias analysis is intraprocedural and flow-sensitive, and follows the “points-to” approach of Andersen [40] . An originality of this alias analysis is that it is conducted over the RTL intermediate language of the CompCert compiler: since RTL is essentially untyped, the traditional approaches to field sensitivity do not apply, and are replaced by a simple but effective tracking of the numerical offsets of pointers with respect to their base memory blocks. Using the Coq proof assistant and techniques inspired from abstract interpretation, Valentin Robert proved the soundness of his alias analysis against the operational semantics of RTL.